6 research outputs found

    Deceptive Opinions Detection Using New Proposed Arabic Semantic Features

    Get PDF
    Some users try to post false reviews to promote or to devalue other’s products and services. This action is known as deceptive opinions spam, where spammers try to gain or to profit from posting untruthful reviews. Therefore, we conducted this work to develop and to implement new semantic features to improve the Arabic deception detection. These features were inspired from the study of discourse parse and the rhetoric relations in Arabic. Looking to the importance of the phrase unit in the Arabic language and the grammatical studies, we have analyzed and selected the most used unit markers and relations to calculate the proposed features. These last were used basically to represent the reviews texts in the classification phase. Thus, the most accurate classification technique used in this area which has been proven by several previous works is the Support Vector Machine classifier (SVM). But there is always a lack concerning the Arabic annotated resources specially for deception detection area as it is considered new research area. Therefore, we used the semi supervised SVM to overcome this problem by using the unlabeled data

    Unbalanced Learning for Early Automatic Diagnosis of Diabetes Based on Enhanced Resampling Technique and Stacking Classifier

    No full text
    International audienceDiabetes is characterized by an abnormally enhanced concentration of glucose in the blood serum. It has a damaging impact on several noble body systems, mainly on the cardiovascular, renal, and visual systems. Automated screening allows early diagnosis of certain illness (such as diabetes), which generally increases the chances for successful treatment. Today, machine learning has developed considerably in the domain of medical diagnosis, especially with regard to diabetes diagnosis, and as such, thanks to the integration of the concept of unbalanced learning, which considerably reduces the generation of erroneous classification results. This general concept is dealt with from two different perspectives, i.e. at the data level through modification/balancing of the learning data set as well as at the algorithm level. The present paper takes a hybrid approach towards imbalanced learning in proposing an enhanced multimodal meta-learning method called IRESAMPLE+St to distinguish between normal and diabetic patients. This approach relies on the Stacking paradigm by utilizing the complementarity that may exist between classifiers. In the same focus of this study, a modified RESAMPLE-based technique referred to as IRESAMPLE+ and the SMOTE method is integrated as a preliminary resampling step to overcome and resolve the problem of unbalanced data. The imbalanced Pima Indian Diabetes (PID) data set is optimized through the proposed IRESAMPLE+ method, successfully operating as both an oversampling and undersampling technique, thereby reinforcing the diagnostic accuracy established by the Stacking classifier. The suggested IRESAMPLE+St provides a computerized diabetes diagnostic system with impressive results, Accuracy of 99.87%, Sensitivity of 100%, Specificity of 99.70% and AUROC of 99.90%, comparing them to the principal related studies. The over-performing results reflect the design and engineering successes achieved with the IRESAMPLE+St system for the classification of diabetes

    Particle Swarm Optimization Based Swarm Intelligence for Active Learning Improvement: Application on Medical Data Classification

    Get PDF
    © 2020, Springer Science+Business Media, LLC, part of Springer Nature. Semi-supervised learning targets the common situation where labeled data are scarce but unlabeled data are abundant. It uses unlabeled data to help supervised learning tasks. In practice, it may make sense to utilize active learning in conjunction with semi-supervised learning. That is, we might allow the learning algorithm to pick a set of unlabeled instances to be labeled by a domain expert, which will then be used as the labeled data set. However, existing approaches are computationally expensive and require searching through an entire unlabeled dataset, which may contain redundant instances that provide no instructive information to the classifier and can decrease the performance. To address this optimization problem, a hybrid system that combines active learning (AL) and particle swarm optimization (PSO) algorithms is proposed to reduce the cost of labeling while building a more efficient classifier. The novelty of this work resides in the integration of a bio-inspired optimization algorithm in the machine learning strategy. Furthermore, a novel uncertainty measure was integrated into the particle swarm optimization algorithm as an objective function to select from massive amounts of medical instances those that are deemed most informative. To evaluate the effectiveness of the proposed approach, eighteen (18) benchmark datasets were used and compared against three best-known classifiers with different learning paradigms: AL–NB an active learning algorithm using Naïve Base classifier and Margin Sampling strategy, SVM (Support Vector Machine), ELM (Extreme Learning Machine) with supervised learning, and TSVM (Transductive Support Vector Machine) with the semi-supervised learning. Experiments showed that the proposed approach is effective in reducing the efforts required by experts for medical data annotation to produce an accurate classifier. The active learning approach has been utilized to optimize the expensive task of labeling. Based on a novel uncertainty measure, the nature-inspired algorithm PSO attempts to select from massive amounts of unlabeled medical instances those considered informative, at the same time improving the classifier performance. The experiments carried out confirm that the proposed strategy significantly enhances the performance of the AL algorithm compared with the commonly used uncertainty strategies. It achieves a performance similar to that of fully supervised and semi-supervised algorithms while requiring much less labeling. As a future extension of this work, it would be interesting to integrate other evolutionary optimization algorithms and compare them with our approach. In addition, it is beneficial to test the impact of using other variants of PSO algorithm in our approach. Also, it is aimed to test more classification algorithms in the experimentation process

    Deceptive Opinions Detection Using New Proposed Arabic Semantic Features

    No full text
    International audienceSome users try to post false reviews to promote or to devalue other's products and services. This action is known as deceptive opinions spam, where spammers try to gain or to profit from posting untruthful reviews. Therefore, we conducted this work to develop and to implement new semantic features to improve the Arabic deception detection. These features were inspired from the study of discourse parse and the rhetoric relations in Arabic. Looking to the importance of the phrase unit in the Arabic language and the grammatical studies, we have analyzed and selected the most used unit markers and relations to calculate the proposed features. These last were used basically to represent the reviews texts in the classification phase. Thus, the most accurate classification technique used in this area which has been proven by several previous works is the Support Vector Machine classifier (SVM). But there is always a lack concerning the Arabic annotated resources specially for deception detection area as it is considered new research area. Therefore, we used the semi supervised SVM to overcome this problem by using the unlabeled data
    corecore